Add fast path for single value in VALUES aggregator #130510

dnhatn · 2025-07-02T22:56:31Z

This change introduces a fast path for the VALUES aggregator in the single-value case. For the first value seen in each group, we add it the new big array without touching the hash. For subsequent values, if they are the same as the current value, we skip them; if they differ, we trigger the slow path and add them to the hash. This optimization speeds up VALUES when the number of groups is large and most groups have only one value.

Before:

Benchmark                      (dataType)  (groups)  Mode  Cnt      Score       Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    3    177.756 ±     2.111  ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    3    126.174 ±     0.431  ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    3  66920.144 ± 53588.490  ms/op

After:

Benchmark                      (dataType)  (groups)  Mode  Cnt      Score      Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1  avgt    3    180.269 ±    4.019  ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000  avgt    3    107.051 ±    3.149  ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000  avgt    3  26277.863 ± 7214.319  ms/op

elasticsearchmachine · 2025-07-03T04:51:35Z

Hi @dnhatn, I've created a changelog YAML for you.

elasticsearchmachine · 2025-07-03T04:54:12Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

nik9000

I like the idea of collecting the first values into only the firstValues array. So we only go to the hash if there's more than one value. I'll have another look in the morning. I think it's correct, but I'll need another scan to make sure.

nik9000 · 2025-07-07T20:43:23Z

...e/src/main/generated-src/org/elasticsearch/compute/aggregation/ValuesBytesRefAggregator.java

@@ -90,6 +91,14 @@ public static void combineIntermediate(GroupingState state, int groupId, BytesRe

    public static void combineStates(GroupingState current, int currentGroupId, GroupingState state, int statePosition) {
        BytesRef scratch = new BytesRef();


Scratch can go below the quick bail outs.

nik9000 · 2025-07-07T21:06:46Z

...e/src/main/generated-src/org/elasticsearch/compute/aggregation/ValuesBytesRefAggregator.java

+                    for (int s = 0; s < selected.getPositionCount(); s++) {
+                        int group = selected.getInt(s);
+                        int count = -selectedCounts[group];
+                        selectedCounts[group] = total;


This isn't the counts any more - it's the counts in values, which is one less than the actual number of values. That's tricky. I'm not entirely sure what's correct here.

Similar to #127849, this change adds an optimized path for leveraging ordinal blocks of intermediate input pages in the Values aggregator. Below are the micro-benchmark results. Before: ``` // 1 raw input page + 1000 intermediate input pages Benchmark (dataType) (groups) Mode Cnt Score Error Units ValuesAggregatorBenchmark.run BytesRef 1 avgt 2 0.382 ms/op ValuesAggregatorBenchmark.run BytesRef 1000 avgt 2 112.293 ms/op ValuesAggregatorBenchmark.run BytesRef 1000000 avgt 2 113182.908 ms/op ``` ``` After: // 1 raw input page + 1000 intermediate input pages Benchmark (dataType) (groups) Mode Cnt Score Error Units ValuesAggregatorBenchmark.run BytesRef 1 avgt 2 0.378 ms/op ValuesAggregatorBenchmark.run BytesRef 1000 avgt 2 34.410 ms/op ValuesAggregatorBenchmark.run BytesRef 1000000 avgt 2 64654.830 ms/op ``` 1K groups: 112 ms -> 34.4ms 1M groups: 113s -> 64s More to come with #130510 Relates #127849

elasticsearchmachine added the v9.2.0 label Jul 2, 2025

dnhatn force-pushed the single-value-path-for-values branch 2 times, most recently from 61f3ad6 to c1d8eb7 Compare July 3, 2025 04:21

Add fast path for single value in VALUES

5860c6e

dnhatn force-pushed the single-value-path-for-values branch from c1d8eb7 to 5860c6e Compare July 3, 2025 04:29

dnhatn added >enhancement :Analytics/ES|QL AKA ESQL labels Jul 3, 2025

Update docs/changelog/130510.yaml

90c3463

github-actions bot deployed to docs-preview July 3, 2025 04:52 View deployment

naming

5a7a5c7

dnhatn requested a review from nik9000 July 3, 2025 04:53

dnhatn marked this pull request as ready for review July 3, 2025 04:53

github-actions bot deployed to docs-preview July 3, 2025 04:54 View deployment

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 3, 2025

nik9000 reviewed Jul 7, 2025

View reviewed changes

dnhatn mentioned this pull request Jul 17, 2025

Add optimized path for intermediate values aggregator #131390

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fast path for single value in VALUES aggregator #130510

Add fast path for single value in VALUES aggregator #130510

Uh oh!

dnhatn commented Jul 2, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jul 3, 2025

Uh oh!

elasticsearchmachine commented Jul 3, 2025

Uh oh!

nik9000 left a comment

Uh oh!

nik9000 Jul 7, 2025

Uh oh!

nik9000 Jul 7, 2025

Uh oh!

Uh oh!

		@@ -90,6 +91,14 @@ public static void combineIntermediate(GroupingState state, int groupId, BytesRe

		public static void combineStates(GroupingState current, int currentGroupId, GroupingState state, int statePosition) {
		BytesRef scratch = new BytesRef();

Add fast path for single value in VALUES aggregator #130510

Are you sure you want to change the base?

Add fast path for single value in VALUES aggregator #130510

Uh oh!

Conversation

dnhatn commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 3, 2025

Uh oh!

elasticsearchmachine commented Jul 3, 2025

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dnhatn commented Jul 2, 2025 •

edited

Loading